Design and Implementation of Integer Dct Architectures for Hevc in Vlsi Technology
نویسنده
چکیده
High Efficiency Video Coding (HEVC) inverse transform for residual coding uses 2-D 4x4 to 32x32 transforms with higher precision as compared to H.264/AVC’s 4x4 and 8x8 transforms resulting in an increased hardware complexity. In this paper, an energy and areaefficient VLSI architecture of an HEVC-compliant inverse transform and dequantization engine is presented. We implement a pipelining scheme to process all transform sizes at a minimum throughput of 2 pixel/cycle with zero-column skipping for improved throughput. We use datagating in the 1-D Inverse Discrete Cosine Transform engine to improve energy-efficiency for smaller transform sizes. A high-density SRAM-based transpose memory is used for an area-efficient design. This design supports decoding of 4K Ultra-HD (3840x2160) video at 30 frame/sec. The inverse transform engine takes 98.1 kgate logic, 16.4 kbit SRAM and 10.82 pJ/pixel while the dequantization engine takes 27.7 kgate logic, 8.2 kbit SRAM and 1.10 pJ/pixel in 40 nm CMOS technology. Although larger transforms require more computation per coefficient, they typically contain a smaller proportion of non-zero coefficients. Due to this trade-off, larger transforms can be more energy-efficient. KeywordsHEVC, Inverse Discrete Cosine Transform, Transpose Memory, Data Gating.
منابع مشابه
Design and Implementation of a High Speed Systolic Serial Multiplier and Squarer for Long Unsigned Integer Using VHDL
A systolic serial multiplier for unsigned numbers is presented which operates without zero words inserted between successive data words, outputs the full product and has only one clock cycle latency. 
The multiplier is based on a modified serial/parallel scheme with two adjacent multiplier cells. Systolic concept is a well-known means of intensive computational task through replication of fu...
متن کاملDesign and Implementation of a High Speed Systolic Serial Multiplier and Squarer for Long Unsigned Integer Using VHDL
A systolic serial multiplier for unsigned numbers is presented which operates without zero words inserted between successive data words, outputs the full product and has only one clock cycle latency. The multiplier is based on a modified serial/parallel scheme with two adjacent multiplier cells. Systolic concept is a well-known means of intensive computational task through replication of func...
متن کاملFPGA implementation of Integer DCT for HEVC
-In this paper, area-efficient architectures for the implementation of integer discrete cosine transform (DCT) of different lengths to be used in High Efficiency Video Coding (HEVC) are proposed. An efficient constant matrix multiplication scheme can be used to derive parallel architectures for 1-D integer DCT of different lengths such as 4, 8, 16, and 32. Also power-efficient structures for fo...
متن کاملVlsi Implementation of Integer Dct Architectures for Hevc in Fpga Technology
High Efficiency Video Coding (HEVC) inverse transform for residual coding uses 2-D 4x4 to 32x32 transforms with higher precision as compared to H.264/AVC’s 4x4 and 8x8 transforms resulting in an increased hardware complexity. In this paper, an energy and area-efficient VLSI architecture of an HEVC-compliant inverse transform and dequantization engine is presented. We implement a pipelining sche...
متن کاملLow-Cost and High-Throughput Hardware Design for the HEVC 16x16 2-D DCT Transform
This article presents the hardware design of the 16x16 2-D DCT used in the new video coding standard, the HEVC – High Efficiency Video Coding. The transforms stage is one of the innovations proposed by HEVC, since a variable size transforms stage is available (from 4x4 to 32x32), allowing the use of transforms with larger dimensions than used in previous standards. The presented design explores...
متن کامل